A Theory of Hygienic Macros

نویسندگان

  • David Herman
  • Mitchell Wand
چکیده

Hygienic macro systems, such as Scheme’s, automatically rename variables to prevent unintentional variable capture—in short, they “just work.” Yet hygiene has never been formally presented as a specification rather than an algorithm. According to folklore, the definition of hygienic macro expansion hinges on the preservation of alphaequivalence. But the only known notion of alpha-equivalence for programs with macros depends on the results of macro expansion! We break this circularity by introducing explicit binding specifications into the syntax of macro definitions, permitting a definition of alpha-equivalence independent of expansion. We define a semantics for a first-order subset of Scheme-like macros and prove hygiene as a consequence of confluence. The subject of macro hygiene is not at all decided, and more research is needed to precisely state what hygiene formally means and [precisely which] assurances it provides. —Oleg Kiselyov [1] 1 What are Hygienic Macros? Programming languages with hygienic macros automatically rename variables to prevent subtle but common bugs arising from unintentional variable capture— the experience of the practical programmer is that hygienic macros “just work.” Numerous macro expansion algorithms for Scheme have been developed over many years [2–6], and the Scheme standard has included hygienic macros since RRS [7]. Yet to date, a formal specification for hygiene has been an elusive goal. Intuitively, macro researchers have always understood hygiene to mean preserving α-equivalence. In particular, performing an α-conversion of a bound variable should not result in a macro expansion that accidentally captures the renamed variable. But this idea has never been made precise. Why should such a simple idea be so hard to formalize? The problem is this: since the only known binding forms in Scheme are the core forms, the binding structure of a Scheme expression does not become apparent until after it has been fully expanded to core Scheme. Thus α-equivalence is only well-defined for Scheme programs that have been fully expanded, with no remaining instances of macros. So if the conventional wisdom is correct, the definition of hygienic macro expansion relies on α-equivalence, but the definition of α-equivalence relies on the results of macro expansion! This circularity is clearly paradoxical, and the definition of hygiene has consequently remained a mystery. But in practice, well-behaved macros follow regular binding disciplines consistently, independent of their particular expansion. For example, Scheme’s let construct can be macro-defined using lambda, yet programmers rely on knowing the binding structure of let without actually thinking about its expansion. If the semantics of macros only had access to this binding structure in such a way that we could reason formally about the scope of Scheme programs without resorting to operational reasoning about their expansion, we could cut the Gordian knot and specify both α-equivalence and hygiene in an intuitive and precise way. To put it more succinctly, we argue that the binding structure of a macro is a part of its interface. In this paper, we make that interface explicit as a type annotation. Our type system is novel but incorporates ideas both from the shape types of Culpepper and Felleisen [8] and nominal datatypes of Gabbay and Pitts [9]. With the aid of these type annotations, we define a notion of αequivalence for Scheme programs with first-ordermacros, i.e., macros that do not expand into subsequent macro definitions, and prove hygiene as a consequence of confluence. We discuss higher-order macros as future work in Section 9. The organization of this paper is as follows. The next section introduces λm , a Scheme-like language with typed macros. Section 3 defines the α-equivalence relation for λm , and Section 4 introduces the macro type system. Section 5 defines the macro expansion semantics. The next two sections present the key correctness theorems: type soundness in Section 6 and hygiene in Section 7. In Section 8 we present a front end for parsing S-expressions as λm expressions. Section 9 concludes with a discussion of related and future work. 2 λm : an Intermediate Language for Modeling Macros In Scheme, macro expansion transforms S-expressions into a small, fixed set of core forms which the underlying compiler or interpreter is designed to recognize. Expansion eliminates uses of macros by translating them according to their definitions, repeating this process recursively until there are no derived forms left to translate. Thus macro expansion consumes programs in surface syntax: (let ((x (sqrt 2))) (let ((y (exp x))) (lambda (f) (f y)))) and produces programs with only the internal forms recognized by the compiler: ((λx. ((λy. λf. f y) (exp x))) (sqrt 2)) We use a distinct syntax for core forms to highlight the fact that they indicate the completion of macro expansion. We use S-expressions not simply to describe Scheme, but as a simple and general model of tree-structured syntax. Because macro expansion operates on partially expanded programs, which may contain both core forms and S-expressions yet to be expanded, a model for macros must incorporate both syntactic elements. To that end, we define an intermediate language for modeling macro expansion, called λm . The core forms are based on the λ-calculus, but with additional forms for local binding of macro definitions and macro application. e ::= v | λv. e | e e | let syntax x = m in e end | opJsKσ v ::= x | ?a op ::= v | m m ::= macro p : σ ⇒ e p ::= ?a | (p) s ::= e | op | (s) Unlike the surface syntax of Scheme, the syntax of λm consists not just of Sexpressions but also expressions e, whose syntactic structure is fixed and manifest. Of course, macros admit arbitrary syntactic extension in the form of Sexpressions, so S-expressions s appear in the grammar as the arguments to macro applications. Here too, though, the syntactic structure is made apparent via a shape type annotation σ. We return in detail to shape types in Section 2.2. Variables v come in two sorts: program variables x, which are standard, and pattern variables ?a, which are bound in macro argument patterns and used in their definitions. Thus, for example, λx. x is a traditional λ-abstraction, but λ?a. ?a might appear in the body of a macro as a λ-abstraction whose bound variable will be provided from one of the macro’s inputs. Macro operators op are either variable references or macro expressions. Macros m contain a pattern p, a type annotation σ, and a template expression e. A pattern p is a tree of pattern variables (assumed not to contain duplicates). Finally, an S-expression s is a tree of expressions or macro operators. The latter form is used to pass macros as arguments to other macros. The syntax of λm may seem unfamiliar compared to the simple S-expressions of Scheme. After all, Scheme applications (s) look different from λm applications opJsKσ and in Scheme, pattern variables are indistinguishable from program variables. However, given shape-annotated macro definitions, we can easily parse surface S-expression syntax into λm . We describe this process in Section 8.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fully-parameterized, first-class modules with hygienic macros

It is possible to define a formal semantics for configuration, elaboration, linking, and evaluation of fully-parameterized first-class modules with hygienic macros, independent compilation, and code sharing. This dissertation defines such a semantics making use of explicit substitution to formalize hygienic expansion and linking. In the module system, interfaces define the static semantics of m...

متن کامل

Hygienic Macros for ACL2

ACL2 is a theorem prover for a purely functional subset of Common Lisp. It inherits Common Lisp’s unhygienic macros, which are used pervasively to eliminate repeated syntactic patterns. The lack of hygiene means that macros do not automatically protect their producers or consumers from accidental variable capture. This paper demonstrates how this lack of hygiene interferes with theorem proving....

متن کامل

Refining Hygienic Macros for Modules and Separate Compilation

Genuine differences in the treatment of identifiers in block-structured languages and those that provide qualified names for accessing components of modules or aggregate data structures invalidate some of the assumptions hygienic macro systems are based on. We will investigate how these assumptions have to be changed, and the consequences for the construction of hygienic macro expanders. Macro ...

متن کامل

Honu: A syntactically extensible language

Honu is a new language that contains a system for extending its syntax with an interface built on concrete syntax. Honu combines an existing hygienic macro system with a novel use of a precedence parser to achieve a syntax that is algol-like while maintaining the power of s-expression based macros. We demonstrate how to build the parser and connect it to the underlying macro system.

متن کامل

Macros that Work Together - Compile-time bindings, partial expansion, and definition contexts

Racket is a large language that is built mostly within itself. Unlike the usual approach taken by non-Lisp languages, the self-hosting of Racket is not a matter of bootstrapping one implementation through a previous implementation, but instead a matter of building a tower of languages and libraries via macros. The upper layers of the tower include a class system, a component system, pedagogic v...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008